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ABSTRACT 


Key Frame Extraction is the summarization of videos for different applications 
like video object recognition and classification, video retrieval and archival 
and surveillance is an active research area in computer vision. In this paper 
describe a new criterion for well presentative keyframes and correspondingly, 
create a key frame selection algorithm based Two-stage Method. A two-stage 
method is used to extract accurate key frames to cover the content for the 
whole video sequence. Firstly, an alternative sequence is got based on color 
characteristic difference between adjacent frames from original sequence. 
Secondly, by analyzing structural characteristic difference between adjacent 
frames from the alternative sequence, the final key frame sequence is 
obtained. And then, an optimization step is added based on the number of final 
key frames in order to ensure the effectiveness of key frame extraction. 
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INTRODUCTION 

Video segmentation and key frame extraction are the bases of video analysis 
and content-based video retrieval. Key frame extraction, is an essential part in 
video analysis and management, providing an entire video summarization for 
video indexing, browsing and retrieval. Key frame extraction is a powerful tool 
that implements video content by selecting a set of summary key frames to 
represent video sequences.Key frame extraction techniques can be roughly 
categorized into four types [1], based on shot boundary, visual information, 
movement analysis, and cluster method. And then sometimes it could be 
completed in compressed domain [2]. 
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Nowadays, cluster-based methods are mostly applied in 
video content analysis research. In these methods, key frame 
extraction is usually modeled as a typical clustering process 
that divides one video shot into several clusters and then one 
or more several frames are extracted based on low or high 
level features. This methods was compressed domain usually 
are not suitable for diverse formats of videos from the 
Internet. Tran coding may increase time complexity and 
inaccuracy. The focus of the work is to represent the video 
content adequately and fast. In this paper, an active 
detection method is proposed. First, the keyframe is defined 
for video copyright protection. And then, a key frame 
extraction algorithm based on two-step method with low 
level features is proposed. The distinct features of algorithm 
as follows. (l)The definition of key frame is specific for video 
copyright protection. (2)The method is with lower 
computation complexity. (3) The method is robust for online 
videos regardless of video formats, video resolution, and so 
on. At shown in figure to summarized this paper process, 
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The Proposed Key Frame Extraction Method 
A. Definition of Key Frame for Video Copyright 
Protection. 

There are some distinct features about the key frame for 
video copyright protection. So, the key frame for video 
copyright protection is defined firstly before video 
preprocessing and key frame extracting. The key frames 
should meet the following three conditions. (1) The gray 
value of a key frame is within a certain range to allow 
viewers to have subjective perception about the video 
content. (2) The final key frame sequence must be arranged 
in chronological order consistent with original video 
sequence, in order to satisfy temporal features and to be 
different from the short promotion trailer. (3) Appropriate 
redundancy of some key frames is allowed to ensure the 
periods or intervals along the processing of video content. In 
general, radio and television programs need to convey 
certain visual content; that is, video images that are too dark 
or too bright do not meet these subjective feelings. The 
phenomenon is sometimes with gradual transitions of shots. 
In order to distinguish and program trailers and other 
programs, the intervals between extracted key frames must 
be consistent with the frames from the original video. As 
online video piracy is often divided into smaller video files 
for playback, thus mastering the key frame extraction should 
allow appropriate redundancy to ensure a period of time. 
Taking the talent show as an example, the moderator 
reviewing screen may arise for every player in a game 
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situation, then the time of video frames' critical information 
is reserved for the key frame extraction processing. 

B. Two-Stage Method for Key Frame Extraction. 

The key frame extraction overall flowchart for digital video 
copyright protection. First, a digital video is decomposed 
into video frames. The downloaded video from the network 
includes several video formats, such as f4v, flv, and mp4. In 
order to improve the universality of video key extraction 
algorithm, the present method does not consider the specific 
format and video stream structure, and the video is decoded 
before the processed video frame decomposition. The 
program to extract key frame is divided into two steps. 
Firstly, alternative key frame sequence based on the color 
characteristics of the original difference between video 
frames is obtained; then key frame sequence is got according 
to the structure characteristic differences between 
alternative key frames sequence, and finally it is determined 
by the number of key frames in order to ensure the 
effectiveness of key frames. Based on the above 
considerations, the frame difference method is used to 
extract key frames by analyzing the presence of spatial 
redundancy and temporal redundancy. In order to improve 
operational efficiency, it is worth mentioning that this 
method is different from the traditional shot segmentation 
method, for that the traditional approach is to conduct a 
video shot segmentation, then to extract key frames from 
each shot, and finally to compose key frame sequence of the 
video. In this method, the segmentation is not considered 
and then to extract key frames directly from the video. 

Alternative Key Frame Sequence Based on Color 
Features. 

Color is one of the important properties of the image and is 
often used to characterize the statistics of the images, and 
even for some specified domain video, color information can 
be expressed directly semantics, such as soccer video, 
usually on behalf of green grass. In addition, different color 
space of the sensory perception of visual effects is 
inconsistent. In order to achieve an effective balance 
between the key frame extraction efficiency and the speed, 
the RGB color space is used and the color histogram for each 
frame is calculated. Then the color histogram difference 
between adjacent frames is adopted in the present method. 
Based on the number of key frames, color feature extraction 
method for video sequence obvious video content 
conversion has a good ability to judge, but to little effect, or 
change the gradient color; light detection effect is not ideal, 
because the color histogram for pretty gradients and lighting 
effects such as gradients are very sensitive to the frame 
between a few dozen frames of video content; despite little 
change between adjacent frames, the significant changes 
between color histogram features are occurring. As 
previously stressed, in order to quickly and effectively 
perform key frame extraction, the video shot segmentation 
will not be adopted directly. Although motion estimation, 
optical flow analysis, and motion modeling method are 
effective in the previous method, the time complexity is also 
too high; these problems have a serious impact on the 
practical application of copyright in video monitoring. 

Final Key Frame Sequence Based on Structure Features. 

A key frame sequence optimization based on structural 
features. The program uses the first frame extraction based 


on color features alternate key and then extracted key 
frames to optimize based on structural features; that is, the 
alternative key frame structure similarity between adjacent 
frames is determined to further reduce key frames. The 
method is derived from the structural similarity evaluation 
method for image quality evaluation and is a measure of the 
similarity of the two images; the value closer to 1 indicates 
that the two images' quality is more similar. Structural 
similarity theory states that natural image signal is highly 
structured and that there is a strong correlation between 
pixels, especially airspace closest pixels; this correlation 
contains important information visual objects in the scene 
structure. Human visual system (HVS) main function is to 
extract structured information from view; it can be used as a 
measure of structural information perceived image quality of 
approximation. In this scenario, the structure is similar to 
the concept introduced to the key frame optimization 
process; thereby removing the extraction of the frame 
structure information is not sensitive to this problem based 
on color feature key. The program uses only similarity index 
structure similar to the structure of the components. From 
the perspective of an image composition, structural 
information is defined as an independent component from 
brightness and contrast in the theory of structural similarity 
index. And it could reflect the properties of objects in the 
scene. The covariance is selected as a structural similarity 
metric. The main calculation is as follows. 

Covariance as a structural similarity measure for the image 
block x, y of the correlation coefficient, namely, the 
covariance of x and y, is calculated as 

- Mi)- (1) 

where N is the number of the patches and fii is the average 
value. In the alternative key frame sequence, the front frame 
could be as the original image, and the adjacent frame is set 
as the test image. According to the two corresponding image 
blocks at the same position x (in the original image) and y 
(in the test image), the structure similarity component 
between the two image blocks is calculated as 


s[x,y) = 


<J X y+C 
Gxay+C f 


( 2 ) 


where C = [[KL] 2/2), K « 1, L E (0, 255] and ox, oy are x 
and y variance, respectively. 

If the component values of (x, y) are small, then the 
distinction between the contents of the information is not; at 
the same time they do not have to be retained as a key frame, 
which can be extracted only as a key frame is optimized. 

Optimization Based on the Number of Key Frames. 

After extracting alternative key frames based on color 
features and key frames based on structural features, the 
number of key frames will be determined to meet the 
demand. If no key frame is extracted from a video, then it 
will extract the appropriate number of key frames from the 
original video, in accordance with isochronous interval. 
Usually this occurs in the lens without the division, such as 
newscasts broadcast of a piece with only anchor shot. There 
are no significant changes in color and structural features 
between video frames. 
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C. Experiments and Analysis 

The method is applied to a lot of online videos downloaded 
from several video websites and the digital linear tapes are 
from Shanghai Media Group. The algorithm was 
implemented in C++ and OpenCV 2.0, and then the 
experiments were conducted on a Windows 7 system with 
an Intel i7 processor and 16 GB RAM. Firstly, we took 
television show “SUPER DIVA" to verify the effectiveness and 
robustness of the proposed method. More than 20 versions 
of the copies or near-duplicates were downloaded, which 
may be different in video formats (.mp4, .rm, .wmv, .flv, etc.), 
spatial resolutions (1920 * 1080, 1080 * 720, 720 * 576, 
etc.), video lengths (such as short clips cut from a full video), 
and so on. The results which are got from the downloaded 
video with mp4 format are partly. Most key frames are 
covering the video content exactly. There are also some 
frames similar with content, such as the three frames in the 
2nd and 3rd row. The difference among these frames is color 
background, especially the bubble lights. So the final key 
frames are extracted based on the structural difference from 
the alternative key frames. In general, these final key frames 
meet the three conditions. The frame content could be 
viewed definitely and their order consisted with the original 
video, and there is appropriate redundancy. Secondly, three 
different versions of SUPER DIVA were tested to get the final 
key frames. They are different in formats or resolutions and 
are noted in VI (.mp4, 640 * 352), V2 (.flv, 608 * 448), and 
V3 (.avi, 512 * 288). Generally, each set of key frames are 
consistent with others, especially with almost the same 
video content and the same time line. The reason for the 
different key frames may be because of the same feature 
difference thresholds, Tc and Ts. Thirdly, the optimization 
step based on the number of key frames was tested. The 
original video is a short promotion trailer about a famous 
movie. There's almost no feature difference among these 
original frames, only the mouth movements and few hand 
movements of the introducer. So no key frames are extracted 
based on the color and structure information. Therefore, the 
optimization based on the a fixed time interval is needed in 
order to satisfy the key frame demand and ensure the 
following processes for video copyright detection. 

D. Conclusion 

A key frame extraction method based on frame difference 
with low level features is proposed for video copyright 
protection. Exactly, a two-stage method is used to extract 
accurate key frames to cover the content for the whole video 
sequence. Firstly, an alternative sequence is obtained based 
on color characteristic difference between adjacent frames 
from original sequence. Secondly, the final key frame 
sequence is obtained by analyzing structural characteristic 
difference between adjacent frames from the alternative 
sequence. And thirdly, an optimization step based on the 
number of final key frames is added in order to ensure the 
effectiveness for video copyright protection processes. 
Tested with several television videos with different content, 
formats, and resolutions, it is shown that the proposed 
method has advantages in computation complexity and 
robustness on several video formats, video resolution, and so 
on. 
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